Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise
نویسندگان
چکیده
An analysis-based non-linear feature extraction approach is proposed, inspired by a model of how speech amplitude spectra are affected by additive noise. Acoustic features are extracted based on the noiserobust parts of speech spectra without losing discriminative information. Two non-linear processing methods, harmonic demodulation and spectral peak-to-valley ratio locking, are designed to minimize mismatch between clean and noisy speech features. A previously studied method, peak isolation [IEEE Transactions on Speech and Audio Processing 5 (1997) 451], is also discussed with this model. These methods do not require noise estimation and are effective in dealing with both stationary and non-stationary noise. In the presence of additive noise, ASR experiments show that using these techniques in the computation of MFCCs improves recognition performance greatly. For the TI46 isolated digits database, the average recognition rate across several SNRs is improved from 60% (using unmodified MFCCs) to 95% (using the proposed techniques) with additive speech-shaped noise. For the Aurora 2 connected digit-string database, the average recognition rate across different noise types, including non-stationary noise background, and SNRs improves from 58% to 80%. 2003 Elsevier Science Ltd. All rights reserved.
منابع مشابه
Dynamic Robust Speech Recognition
Robust recognition theory has become one of research focuses of acoustic speech recognition. Acoustic speech digital signal is a random process repeatedly alternating stationary pieces with non-stationary pieces. However both the current linear and stationary characteristic parameters drawn from such signals and the rigid recognition models do not adapt to such repeatedly alternating property o...
متن کاملRobust Speech Recognition Features Based on Temporal Trajectory Filtering and Non-Uniform Spectral Compression
This paper proposes a new feature extraction method based on temporal trajectory filtering and nonuniform spectral compression and examines its performance with two tasks in noisy environments. Temporal trajectory filtering is effective for robust speech recognition in noisy environments, due to human hearing is more sensitive to relative values rather than absolute values and the effect of add...
متن کاملSpeaker feature extraction from pitch information based on spectral subtraction for speaker identification
Robust speaker feature extraction under noise conditions is an important issue for application of a speaker recognition system. It is well known that LPC cepstrum, which expresses the spectral envelope, is e ective for speaker recognition. This implies that the spectral rough structure is e ective for speaker recognition. However, LPC cepstrum is a noise-sensitive feature. On the other hand, sp...
متن کاملRobust speech enhancement techniques for ASR in non-stationary noise and dynamic environments
In the current ASR systems the presence of competing speakers greatly degrades the recognition performance. This phenomenon is getting even more prominent in the case of hands-free, far-field ASR systems like the “Smart-TV” systems, where reverberation and non-stationary noise pose additional challenges. Furthermore, speakers are, most often, not standing still while speaking. To address these ...
متن کاملAdaptive Enhancement of Speech Signals for Robust ASR
Behavior of the least squares filter (LeSF) is analyzed for a class of non-stationary signals that are composed of multiple sinusoids whose frequencies and the amplitudes may vary from block to block and which are embedded in white noise. Analytic expressions for the weights and the output of the LeSF are derived as a function of the block length and the signal SNR computed over the correspondi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 17 شماره
صفحات -
تاریخ انتشار 2003